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Abstract 

Minimization of the Loo norm, which can be viewed as approximately solving the non-convex least median estimation 

problem, is a powerful method for outlier removal and hence robust regression. However, current techniques for solving 

|the problem at the heart of Loo norm minimization are slow, and therefore cannot scale to large problems. A new method 

f— ^ for the minimization of the Loo norm is presented here, which provides a speedup of multiple orders of magnitude for 

^SJ data with high dimension. This method, termed Fast Loo Minimization, allows robust regression to be applied to a class 

^ of problems which were previously inaccessible. It is shown how the Loo norm minimization problem can be broken up 

Mninto smaller sub-problems, which can then be solved extremely efficiently. Experimental results demonstrate the radical 

■^^ reduction in computation time, along with robustness against large numbers of outliers in a few model-fitting problems. 

\l Keywords: Least-squares regression, outlier removal, robust regression, face recognition 
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,1. Introduction 

Linear least-squares (LS) estimation, or linear L2 norm 
minimization (denoted as L2 for brevity throughout this 
paper), is widely used in computer vision and image anal- 
ysis due to its simplicity and efficiency. Recently the L2 
jiorm technique has been applied to recognition problems 
such as face recognition [l|,l4|. All of these methods are lin- 
ear regression-based and the regression residual is utilized 
•to make the final classification. However, a small number 
pi outliers can drastically bias L2, leading to low quality 
estimates. Clearly, robust regression techniques are criti- 
cal when outliers are present. 

The literature contains a range of different approaches 
to robust regression. One commonly used method is the 
M-estimator framework [Sl, |4| , where the Huber function 
is minimized, rather than the conventional L2 norm. Re- 
'lated methods include L-estimators [5] and R-estimators 
[6| . One drawback of these methods is that they are still 
vulnerable to bad leverage outliers [7|. By bad leverage 
points, we mean those observations who are outlying in 
x-space and do not follow the linear pattern of the ma- 
jority. Least median of squares (LMS) ig|, least trimmed 
squares (LTS) [9| and the technique using data partition 
and M-estimation [10| have high-breakdown points. Al- 
though each of these regression methods is, in general, 
more robust than L2, they have rarely been applied to ob- 
ject recognition problems in computer vision due to their 
computational expense 
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Another class of robust methods have been developed 
to remove these abnormal observations from the measure- 
ment data. One of the most popular methods is RANSAC 
13| which attempts to maximise the size of the consensus 



set. RANSAC relies on iterative random sampling and 
consensus testing where the size of each sample is deter- 
mined by the minimum number of data points require to 
compute a single solution. RANSAC 's efficiency is there- 
fore directly tied to the time and number of data points 
required to compute a solution. For example, RANSAC 
has been successfully applied to multiview structure-from- 
motion and homography estimation problems. However, 
it is unclear how to apply RANSAC to visual recognition 
problems, e.g., face recognition, where face images are usu- 
ally in a high-dimensional space. 

Sim and Hartley [14| proposed an outlier-removing 
method using the Loo norm, which iteratively fits a model 
to the data and removes the measurement with the largest 
residual at each iteration. Generally, the iterative method 
can fail for the L2 optimization problems, however it is 
valid for a wide class of Loo problems. Sim and Hartley 
proved that the set of measurements with largest resid- 
ual must contain at least one outlier. Hence continuing to 
iterate eventually removes all the outliers. This method 
is shown to be effective in outlier detection for multiview 
geometry problems [IJ, |l5[ . 

Loo norm minimization can be time-consuming, since 
at each step one needs to solve an optimization problem 
via Second-Order Cone Programming (SOCP) or Linear 
Programming (LP) in the application of multi-view geom- 
etry IGlliZi- The software package SeDuMi 18] provides 
solvers for both SOCP and LP problems. 

In this paper, we propose a fast algorithm to minimize 
the Loo norm for approximating the least median esti- 
mation (denoted as Loo for brevity throughout the pa- 
per). Observing that the Loo norm is determined by only 
a few measurements, the optimization strategy column 
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generation (CG) J^j can be applied to reduce the main 
problem into a set of much smaller sub-problems. Each 
sub-problem can be formulated as a Quadratic Program- 
ming (QP) problem. Due to its relatively small size, the 
QP problem can be solved extremely efficiently using cus- 
tomized solvers. In particular, we can generate solvers 
using the technique introduced by Mattingley and Boyd 
|20| . This reduction results in a speedup of several orders 
of magnitude for high dimensional data. 

This degree of speedup allows Loo to be applied to prob- 
lems which were previously inaccessible. We show how the 
Loo outlier removal technique can be applied to several 
classification problems in computer vision. Representa- 
tions of objects in this type of problems are often derived 
by using L2 to solve equations containing samples in the 
same class (or other collaborative classes) [l|, I4I . Repre- 
sentation errors are then taken into classification where 
the query object is assigned to the class corresponding to 
the minimal residual. This method is shown to be ef- 
fective on data without occlusion or outliers. However, in 
real-world applications, measurement data are almost al- 
ways contaminated by noises or outliers. Before a robust 
representation can be obtained via linear estimation, out- 
lier removal is necessary. The proposed method is shown 
to significantly improve the classification accuracies in our 
experiments for face recognition and iris recognition on 
several public datasets. 



2. Related work 



Hartley and Schaffalitzky [2l| seek a globally optimal 
solution for multi-view geometry problems via Loo norm 
optimization, based on the fact that many geometry prob- 
lems in computer vision have a single local, and hence 
global, minimum under the Loo norm. In contrast, the 
commonly used L2 cost function typically has multiple lo- 
cal minimum |14l . |21| . This work has been extended by 
several authors, yielding a large set of geometry problems 
whose globally optimal solution can be found using the 
Loo norm (Olsson provides a summary [221). 

It was observed that these geometry problems are ex- 
amples of quasiconvex optimization problems, which are 
typically solved by a sequence of SOCPs using a bisection 
(binary search) algorithm [la, |l7|. Olsson et al. j2j| show 
that the functions involved in the Loo norm problems are 
in fact pseudoconvex which is a stronger condition than 
quasiconvex. As a consequence, several fast algorithms 
have been proposed 22, 23!|. 

Sim and Hartley [14] propose a simple method based 
on the Loo norm for outlier removal, where measurements 
with maximal residuals are thrown away. The authors 
prove that at least one outlier is removed at each itera- 
tion, meaning that all outliers will be rejected in a finite 
number of iterations. However the method is not efficient, 
since one need to solve a sequence of SOCPs. Observing 
that many fixed-dimensional Loo geometry problems are 



actually instances of LP-type problem, an LP-type frame- 
work was proposed for the multi-view triangulation prob- 
lem with outliers |24| . 

Recently, the Lagrange dual problem of the Loo niini- 
mization problem posed in 2l[ was derived in 15). To 
further boost the efficiency of the method, the authors of 
[l5[ proposed an Li-minimization algorithm for outlier re- 
moval. While the aforementioned methods add a single 
slack variable and repeatedly solve a feasibility problem, 
the Li algorithm adds one slack variable for each residual 
and then solves a single convex program. While efficient, 
this method is only successful on data drawn from partic- 
ular statistical distributions. 

Robust statistical techniques, including the aforemen- 
tioned robust regression and outlier removal methods, can 
significantly improve the performance of their classic coun- 
terparts. However, they have rarely been applied in image 
analysis field, to problems such as visual recognition, due 
to their computational expense. The M-estimator method 
is utilized in 11] for face recognition and achieved high 
accuracy even when illumination change and pixel corrup- 
tion were present. In [25J, the authors propose a theoret- 
ical framework combining reconstructive and discrimina- 
tive subspace methods for robust classification and regres- 
sion. This framework acts on subsets of pixels in images 
to detect outliers. 

The reminder of this paper is organized as follows. In 
Section [3] we briefly review the L2 and Loo problems. In 
SectionlH the main outlier removal algorithm is presented. 
In Section [SJ we formulate the Loo norm minimization 
problem into a set of small sub-problems which can be 
solved with high efficiency. We then apply the outlier re- 
moval technique in Section |5] to several visual recognition 
applications. Finally the conclusion is given in Section |8l 

3. The L2 and L^o norm minimization problems 

In this section, we briefiy present the Loo norm mini- 
mization problem in the form we use in several recognition 
problems. Let us first examine the L2 norm minimization 
problem, 
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(1) 



for which we have a closed-form solutioiij 

(3\^ = (A^A)-iATy, (2) 

where A — [aj ; . . . ; a^] G K"^'' is the measurement data 
matrix, composed of rows a^ G M'', and usually n » d. 
The model's response is represented by the vector y G K" 
and /3 G M'^ stores the parameters to be estimated. Note 



^The closed-form solution can only be obtained when A is over- 
determined, i.e., n > d. When n < d, one can solve the multi- 
coUinearity problem by ridge regression, or another variable selection 
method, to obtain a unique solution. 



that in our visual recognition applications, both y and the 
columns of A are images flattened to vectors. According to 
the linear subspace assumption 26| , a probe image can be 
approximately represented by a linear combination of the 
training samples of the same class: y w A/3. Due to its 
simplicity and efficacy, the linear representation method 
is widely used in various image analysis applications, e.g.. 

The L2 norm minimization aims to minimize the sum 
of squared residuals (HJ, where the terms fi {(3) = (ai/3 — 
yiY, i (z I = {l,...,n}, are the squared residuals. L2 
norm minimization is simple and efficient, however it uti- 
lizes the entire data set and therefore can be easily influ- 
enced by outliers. 

Instead of minimizing the sum of squared residuals, the 
L00-L2 norm minimization method seeks to minimize only 
the maximal residual, leading to the following formulation: 



min max (a^/J — yi) 



el. 



(3) 



This equation has no closed-form solution, however it 
can be easily reformulated into a constrained formulation, 
with an auxiliary variable: 
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s.t. (ai/3 - y,f < s, Vi G /; 



(4) 



which is clearly a SOCP problem. If we take the absolute 
value of the residual in (jSj, we obtain 



min max ja^/J — yi\, i G I. 



(5) 



leading to an LP problem 
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s.t. ai(3 - yi < s, 

aif3 — yi > — s, for all i <E I. 



(6) 



A critical advantage of the L^c norm cost function is 
that it has a single global minimum in many multi-view 
geometry problems 2l|, |2j| . Unfortunately, like L2 norm 
minimization, the Lqq norm minimization method is also 
vulnerable to outliers. Moreover, minimizing the L^c norm 
fits to the outliers, rather than the data truly generated 
by the model [14]. Therefore, it is necessary to first reject 
outliers before the estimation. 



4. Outlier removal via maximum residual 

In [1J| , outlier removal is conducted in an iterative fash- 
ion by first minimizing the L^a norm, then removing the 
measurements with maximum residual and then repeating. 
The measurements with maximum residual are referred to 
as the support set of the minimax problem, i.e.. 



where 5 opt = min^ max^g/ /i(/3) is the optimum residual of 
the minimax problem. The outlier removal strategy does 
not work well for the general L2 minimization problems 
(i.e., Sopt = niin^ ^^ /i(/3)), because the outliers are not 
guaranteed to be included in the support set. In contrast, 
this strategy is valid for the L^o minimization problems. 
For problem ([3]) or ([5]), it is proved by the following the- 
orems that the measurements with largest residual must 
contain at least one outlier. 

Suppose the index vector / is composed of Ii„ and lout, 
the inlier and outlier sets respectively, and there exists Sin 
such that min^ maxig/.^ /i(/3) < <^m. Then we have the 
following theorem. 

Theorem 1. ^14] Consider the Lqo norm minimization 
problem ^ or ^ with the optimal residual Smax = 
min^ maxig/ fi{(3). If there exists an inlier subset /i„ C / 
for which min^ max^g/.^ /i(/3) < <^m < Sopt, then the sup- 
port set Isupp must contain at least one index i G lout, that 
is, an outlier. 

Following Theorem 2 in [14], Theorem [T] can be easily 
proved based on the following theorem. 

Theorem 2. // iq is not in the support set Isupp for 
the minimax problem ^ or ([5]), then removing the mea- 
surement with respect to iq will not decrease the optimal 
residual Sopt- Formally, if io ^ Isupp, then 



min max fi{(3) 
P ie/\io 



min max f, (3) 



-'opt ■ 



(8) 



hupp = { J e / I /i(/3*) == Sopt}, 



(7) 



Proof. It is not difficult to verify that both the residual 
error function (ai/3 — yiY and ja^/S — yi\ are convex, and 
therefore also quasiconvex [31,]. Furthermore these two 
error functions are also strictly convex then also strictly 
quasiconvex. Then due to Corollary 1 in [14] (omitted 
here), the theorem holds. D 

At each iteration, we first obtain the optimal parameters 
(3* by solving ([3]) or ([S|) and then remove the measurements 
(pixels in images) corresponding to largest residual. If we 
continue the iteration, all outliers are eventually removed. 

As with all outliers removal processes, there is a risk 
that discarding a set of outliers will remove some inliers at 
the same time. In this framework, the outliers are individ- 
ual pixels, which are in good supply in visual recognition 
applications. For example, a face image will typically con- 
tain hundreds or thousands of pixels. Removing a small 
fraction of the good pixels is therefore unlikely to affect 
recognition performance. However, if too many pixels are 
removed, the remaining pool may be too small for success- 
ful recognition. Therefore, we propose a process to restore 
incorrectly removed pixels where possible, as part of the 
overall outlier removal algorithm list in Algorithm [TJ In 
practice, the heuristic remedy step does improve the per- 
formance of our method on the visual recognition problems 
in our experiments. Also note that it is impossible that all 



Algorithm 1 Outlier removal using the Loo norm 

1: Input: the measurement data matrix A G M"^''; the 

response vector y G M" ; outlier percent p. 
2: Initialization: I ^- 0; number of measurements to be 

removed L ^ [n x pj ; index /( <— {1, . . . , n}. 
3: while I < L do 

4: Solve the Loo norm minimization problem: 6opt — 
min^maxig/j /i(^); get the support set Isupp via 
equation ([7]). 
5: Remove the measurements with indices in Isupp, i-e., 

-'^t ^ ^t \ -^supp' 

6: Remedy. Solve the minimax problem again with 
the new index /( and get the optimal residual 6opt 
and parameter /3*. Move the indices Ir = {i £ 



Isupp\fi{l3*) < Sopt} back to It- 

I i i -\- \lsupp I ■ 

end while 

Output: At and jt with measurement index i £ It- 



joints in the support set are moved back in step 6, which 
is because, based on Theorem [2] we can prove that 



max fi{l3*) > min max fi{/3) 



= mm max 



/i(/3) > minmax/j(/3). 



5. A fast algorithm for the L^o norm minimization 
problem 

Recalling Theorem[51 we may remove any data not in the 
support set without changing the value of the Loo norm. 
This property allows us to subdivide the large problem 
into a set of smaller sub-problems. We will proceed by 
first presenting a useful definition of pseudoconvexity: 

Definition 1. A function /(•) is called pseudoconvex if 
/(•) is differentiable and V f{x){x ~ x) > implies f{x) > 

fix). 

In this definition, /(•) has to be differentiable. However 
the notion of pseudoconvexity can be generalized to non- 
differentiable functions [32j: 

Definition 2. A function /(■) is called pseudoconvex if 
for all x,y Cz dom (/); 



3x* edf{x) : {x*,y~x) >0 
^yze[x,y):f{z)<f{y), 



(9) 



where df{x) is subdifferential of f{x). 

Both of these definitions share the property that any 
local minimum of a pseudoconvex function is also a global 
minimum. Based on the first definition, it has been proved 
that if the residual error functions /i(-) are pseudoconvex 
and differentiable, the cardinality of the support set is not 
larger than d+ 1 [23|, |33J . Following the proof in [23] , one 
can easily validate the following corollary: 



Corollary 1. For the minimax problem with pseudocon- 
vex residual functions (differentiable or not), there must 
exist a subset Is C /, |/s| < d + 1 such that 



fiAn 



min max f,(/3) 



flW* 



min max fJd). 



(10) 

It is clear that the squared residual functions /i(/3) = 
(ai/3 — yi)^ in ([3]) are convex and differentiable, hence 
also, pseudoconvex. The absolute residual function ([S]): 
/i(/3) = |ai/3 — yi\ is sub-differentiable. It is easy to verify 
that the only non-differentiable point, the origin, satisfies 
the second definition. Therefore function ([5]) also satisfies 
Corollary I. 

The above corollary says that we can solve a sub- 
problem with at most d-\-l measurements without chang- 
ing the estimated solution to the original minimax prob- 
lem. However before solving the sub-problems, we should 
first determine the support set. We choose to solve a set of 
small sub-problems using an optimization method called 
column generation (CG) [l9[. The CG method adds one 
constraint at a time to the current sub-problem until an 
optimal solution is obtained. 

The process is as follows: We first choose d + 1 mea- 
surements not contained in the support set. These data 
are then used to compute a solution and residuals for all 
the data in the main problem are determined. Then the 
most violated constraint, or the measurement correspond- 
ing to the largest residual, is added to the sub-problem. 
The sub-problem is then too large, therefore we solve the 
sub-problem again (now with size d -t- 2) and remove an 
inactive measurement. Through this strategy, the problem 
is prevented from growing too large, and violating Corol- 
lary 1. When there are no violated constraints, we have 
obtained the optimal solution. 

The proposed fast method is presented in Algorithmic 
We divide the data into an active set, corresponding to 
a sub-problem, and the remaining set with the L2 norm 
minimization. This algorithm allows us to solve the orig- 
inal problem with the measurement matrix of size n x d, 
by solving a series of small problems with size (d + 1) x d 
or (d + 2) X d. In most visual recognition problems, n ^ d. 
Typically the algorithm converges in less than 30 iterations 
in all of our experiments. We will show that this strategy 
radically improves computational efficiency. 

For maximal efficiency, we choose to solve the LP prob- 
lem, ^, and utilize the code generator CVXGEN [20| to 
generate custom, high speed solvers for the sub-problems 
in algorithmic) CVXGEN is a software tool that automati- 
cally generates customized C code for LP or QP problems 
of modest size. CVXGEN is less effective on large problems 
(e.g., > 2000 variables). However, in Algorithm 12] we con- 
vert the original problem into a set of small sub-problems, 
which can be efficiently solved with the generator. CVXGEN 
embeds the problem size into the generated code, restrict- 
ing it to fixed-size problems [20| . The proposed method is 
only ever solves problems of size d + 1 or d + 2, enabling 
the use of CVXGEN. 



Algorithm 2 A fast algorithm for the Loo problem 

1: Input: The measurement data matrix A G R"^''; the 
response vector y G K"; maximum iteration number 

'"max ■ 

2: Initialization: Initialize the active set Is with indices 
corresponding to the largest d + I absolute residuals 
from vector | fi{fiis) }, i G / using the LS solution as 
in ©; set /r ^ / \ /s; set iteration counter / ■<— 0. 

3: Solve the ioo-minimization sub-problem 



(3* ~ argminmax/i(/3) 



(11) 



and set t* ^ max^g/^ fi{f3*). 

Virhile I < 1,-aax do 

Get the most violated measurement from the re- 
maining set Ir'- irn = arg HiaXig/^. fi{f^*) 

Check for optimal solution: 

if fi^{(3) < t* , then break (problem solved). 

Update the active and remaining set: 

Is <- IsU {im}, Ir <- Ir\ {im}- 

Solve the Loo-minimization sub-problem in (jlip . 
Move the inactive measurement index id ~ 
argmin/i(^*) to the remaining set: ig ■(— /^ \ {id}, 

Ir^Ir^ {id} 
l^l + l. 

end while 
Output: (3*. 



6. Experimental Results 

In this section, we first illustrate the effectiveness and ef- 
ficiency of our algorithm on several classic geometric model 
fitting problems. Then the proposed method is evaluated 
on face recognition problems with both artificial and nat- 
ural contiguous occlusions. Finally, we test our method 
on the iris recognition problem, where both segmenta- 
tion error and occlusions are present. For comparison, we 
also evaluate several other representative robust regression 
methods on face recognition problems. 

Once the outliers have been removed from the data set, 
any solver can be used to obtain the final model esti- 
mate. We implemented the original minimax algorithm 
using Matlab package CVX [M] with SeDuMi solver ^ 
while the proposed fast algorithm was implemented us- 
ing solvers generated by CVXGEN [20|. All experiments are 
conducted in Matlab running on a PC with a Quad-Core 
3.07GHz CPU and 12GB of RAM, using mex to call the 
solvers from CVXGEN. Note that the algorithm makes no 
special effort to use multiple cores, though Matlab itself 
may do so if possible. 

6.1. Geometric model fitting 

6.1.1. Line fitting 

Figure [1] shows estimation performance when our algo- 
rithm is used for outlier removal and the line is subse- 



quently estimated via least squares, on data generated un- 
der two different error models. 

We generate A G M" ^ '^ randomly and /3 G M'' randomly. 
We then set the first k error terms e^, j G [1,...,A;] as 
independent standard normal random variables. We set 
the last n — k error terms ej,j G [k + I, . . . ,n] as inde- 
pendent chi squared random variables with 5 degrees of 
freedom. We also test using the two-sided contamination 
model which sets the sign of the last n — k variables ran- 
domly such that the outliers lie on both side of the true 
regression line. In both cases we set y = A/3 4- e. 

As can be seen in Figure[TJ our method detects all of the 
outliers and consequently generates a line estimate which 
fits the inlier set well for both noise models, whilst the 
estimate obtained with the outliers included achieves a 
reasonable estimate only for the two-sided contamination 
case, where the outliers are evenly distributed on both 
sides of the line. 



6.1.2. Ellipse fitting 

An example of the performance of our method applied 
to ellipse fitting is shown in Figure [21 100 points were 
sampled uniformly around the perimeter of an ellipse cen- 
tred at (0,0), and where then perturbed via offset drawn 
from Af(0, 1). 30 outliers were randomly drawn from an 
approximately uniform distribution within the bounding 
box shown in Figure [2] The result of the method, again 
shown in Figure [51 shows that our method has correctly 
identified the inlier and outlier sets, and demonstrates that 
the centre and radius estimated by our method are accu- 
rate. 

6.1.3. Efficiency 

Next we compare the computational efficiency of the 
standard Loo norm outlier removal process and of the pro- 
posed fast algorithm. 




Figure 2: An example of the performance of our method in ellipse 
fitting. Points identified as outliers are marked by circles. 
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Figure 1: Two examples demonstrating the performance of our algorithm on data contaminated on one-side (left) and two-sides (right). 
Outliers detected by our algorithm are marked with circles. The solid black line is the result after outlier removal and the red dashed line 
shows the result with outliers included. In these two cases, we set n = 100 and k = 70, where n and k are described in Section l6.1.1l 
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Method 



Feature dimension 

4 6 8 



10 



original 1.926 1.984 1.994 2.016 2.039 
fast 0.031 0.045 0.068 0.096 0.128 

Table 2: Computation time (in seconds) comparison of the origi- 
nal and fast algorithms implemented with different solvers. Data is 
generated as 200 observations with dimension varying from 2 to 10. 



For the line fitting problem we generate the data using 
the scheme described previously. We initially fix the data 
dimension d — 2 and increase the problem size n from 20 to 
10000. The outlier fraction, fc, is set 90% of n. A compari- 
son of the running time for these two algorithms are shown 
in Table[TJ The fast algorithm finishes 70 to 80 times faster 
than the original algorithm. Specifically, with dimension 
10000 the fast algorithm finishes in approximately 4 sec- 
onds, whilst the original algorithm requires more than 5 
minutes. In this case, the proposed fast approach is about 
80 times faster than the conventional approach. 

Second, we fix the number of observations n = 200 and 
vary the data dimension d from 2 to 10. Execution times 
are shown in Table[21 Consistent with the last experiment, 
the fast algorithm completes far more rapidly than the 
original algorithm in all situations. When d — 2, the pro- 
posed fast algorithm is faster than the original algorithm 
by more than 60 times. With a larger dimension d = 10, 
the proposed fast algorithm takes only 0.128 seconds to 
complete while the original algorithm requires more than 
2 seconds. 

6.2. Robust face recognition 

In this section, we test our method on face recogni- 
tion problems from 3 datasets: AR [35], Extended Yale B 
si], and CMU-PIE ^. A range of state-of-the-art algo- 
rithms are compared to the proposed method. Recently, 



sparse representation based classification (SRC) [28| ob- 
tained an excellent performance for robust face recog- 
nition problems, especially with contiguous occlusions. 
The SRC problem solves min||/3||i, s.t. ||y - A/3||2 < e, 
where A is the training data from all classes, /3 is the 
corresponding coefficient vector and e > is the error 
tolerance. To handle occlusions, SRC is extended to 
min||aj||i, s.t. ||y — Bcl'||2 < e, where B = [A, I], and 
w = [/3,e]. I and e are the identity matrix and error vec- 
tor respectively. SRC assigns the test image to the class 
with smallest residual: identity (y) — miui ||y — Adi{f3)\\2. 
Here (5i(/3) is a vector whose only nonzero entries are the 
entries in /3 corresponding to the i-th class. We also eval- 
uate the following two methods which are related to our 
method. Most recently, a method called Collaborative 
Representation-based Classification (CRC) was proposed 
in [27] which relax the Li norm to L2 norm. Linear regres- 
sion classification (LRC) [l| cast face recognition as a sim- 
ple linear regression problem: min||/3j||i, s.t. y = Ai/3j, 
where A^ and /3j are the training data and representative 
coefficients with respect to class i. LRC selects the class 
with smallest residual: identity{y) = min^ ||y — Aif3^\\2. 
Both CRC and LRC achieved competitive or even better 
results than SRC [ij, |27| in some cases. 

For the purposes of the face recognition experiments, 
outlier pixels are first removed using our method, leaving 
the remaining inlier set to be processed by any regression 
based classifier. In the experiments listed below, LRC has 
been used for this purpose due to its computational effi- 
ciency. 

Lots of robust regression estimators has been developed 
in the statistic literature. In this section, we also compare 
other two popular estimators, namely. Least median of 
squares (LMS) 8] and MM-estimator [38[, both of which 
have high-breakdown points and do not need to specify 
the number of outliers to be removed. Comparison is con- 
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Figure 3: Detecting the sunglasses occlusion of two example images 
with dimension 27 X 20 from the AR dataset. Each row shows an 
original image of a particular subject, followed by an image where 
outlying pixels have been automatically marked in white and finally 
a reconstructed image of the subject. 



Method 


Feature dimension 

54 130 300 


540 


LRC 


21.0% 


38.5% 


54.5% 


60.0% 


SRC 


48.0% 


67.5% 


69.5% 


64.5% 


CRC 


22.0% 


35.5% 


44.5% 


56.0% 


MM-estimator 


0.5% 


8.5% 


21% 


24% 


LMS 


9% 


25% 


37.5% 


48% 


our method 


43.0% 


85.0% 


99.5% 


100% 



Table 3; Accuracy rates (%) of different methods on the AR dataset 
with sunglasses occlusion. The various feature dimension correspond 
to downsampling the original 165 X 120 pixel images to 9 X 6, 13 X 10, 
20 X 15 and 27 X 20, respectively. 



ducted on face recognition problems with both artificial 
and natural occlusions. These methods are first used to 
estimate the coefficient /? and face images are recognized 
by the minimal residuals. 

6.2.1. Faces recognition despite disguise 

The AR dataset [35| consists of over 4000 facial images 
from 126 subjects (70 men and 56 women). For each sub- 
ject 26 facial images were taken in two separate sessions, 
13 per session. The images exhibit a number of variations 
including various facial expressions (neutral, smile, anger, 
and scream) , illuminations (left light on, right light on and 
all side lights on), and occlusion by sunglasses and scarves. 
Of the 126 subjects available, 100 have been randomly se- 
lected for testing (50 males and 50 females) and the images 
cropped to 165 x 120 pixels. 8 images of each subject with 
various facial expressions, but no occlusions, were selected 
for training. Testing was carried out on 2 images of each of 
the selected subjects wearing sunglasses. Figure [3] shows 
two typical images from the AR dataset with the outliers 
(30% of all the pixels in the face images) detected by our 
method set to white. The reconstructed images are shown 
as the third and sixth images. 

The images were downsampled to produce features of 30, 
54, 130, and 540 dimensions respectively. Table |3] shows 
a comparison of the recognition rates of various meth- 
ods. Our method exhibits superior performance to LRC, 
CRC and SRC in all except the lowest feature dimension 
case. Specifically with feature dimension 540, the proposed 
method achieves a perfect accuracy 100%, which outper- 



Figure 4: Detecting the random placed square monkey face (the 
first image) in an example face image (the second image) from the 
Extended Yale B dataset. Outliers (30% of the whole image) detected 
by our method are marked as white pixels (the third image) and the 
reconstructed image is shown on the rightmost. 



forms LRC, CRC and SRC by 40%, 44% and 35.5% re- 
spectively. MM-estimator and LMS failed to achieve good 
results on this dataset, which is mainly because the resid- 
uals of outliers severely affect the final classification al- 
though relatively accurate coefficients could be estimated. 
In this face recognition application, the final classification 
is based on the fitting residual. These results highlight 
the ability of our method for outlier removal, which can 
significantly improve the face recognition performance. 

6.2.2. Contiguous block occlusions 

In order to evaluate the performance of the algorithm 
in the presence of artificial noise and larger occlusions, we 
now describe testing where large regions of the original 
image are replaced by pixels from another source. The 
Extended Yale B dataset [36| was used as the source of 
the original images and consists of 2414 frontal face im- 
ages from 38 subjects under various lighting conditions. 
The images are cropped and normalized to 192 x 168 pix- 
els '39']. Following [28|, we choose subsets 1 and 2 (715 
images ) for training and Subset 3 (451 images) for test- 
ing. In our experiment, all the images are downsampled to 
24 X 21 pixels. We replace a randomly selected region, cov- 
ering between 10% and 50% of each image, with a square 
monkey face. Figure |4] shows the monkey face, an example 
of an occluded image, the outlying pixels detected by our 
method and a reconstructed copy of the input image. 

Table S] compares the average recognition rates of the 
different methods, averaged over five separate runs. Our 
proposed method outperforms all other methods in all 
conditions. With small occlusions, all methods achieve 
high accuracy, however, the performance of LRC, SRC 
and CRC deteriorate dramatically as the size of the oc- 
clusion increases. In contrast, our method is robust in the 
presence of outliers. In particular, with 30% occlusion our 
method obtains 98.2% accuracy while recognition rates of 
all the other methods are below 80%. With 50% occlu- 
sion, all other methods show low performances, while the 
accuracy for our method is still above 85%. According to 
Table m we can also see that the proposed method is more 
stable in the sense of accuracy variations, which is mainly 
because the outliers are effectively detected. 

6.2.3. Partial face features on the CMU PIE dataset 

As shown in the previous example, occlusion can signif- 
icantly reduce face recognition performance, particul arly 
in methods without outlier removal. Wright et al. [28J 



Method 


10% 


20% 


Occlusion rate 

30% 35% 


40% 


50% 


LRC 


98.5 ±0.18 


88.6 ±0.18 


68.8 ±3.3 60.4 ±3.2 


51.8 ±1.3 


41.7±1.7 


SRC 


99.0 ±0.3 


94.8 ±1.2 


77.2 ±1.5 67.3 ±1.1 


56.2 ±2.2 


45.9 ±1.5 


CRC 


98.6 ±0.7 


90.3 ±1.7 


74.9 ±2.6 67.2 ±1.0 


58.3 ±1.8 


48.3 ±2.0 


our method 


99.8 ±0.1 


99.3 ±0.1 


98.2 ±1.0 96 ±1.2 


91.4 ±0.7 


85.5 ±0.7 



Table 4: Mean and standard deviations of recognition accuracies (%) in the presence of randomly placed block occlusions of images from the 
Extended Yale B dataset based on 5 runs results. 
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Figure 5: This figure demonstrates the detection of the dead pixels, 
covering about 13% of the input image. Each row shows a distinct 
example drawn from the CMU-PIE dataset. Outliers detected by our 
method are marked as white pixels in the centre column, followed by 
the reconstructed images in the last column. 



attempt to identify faces on the basis of particular sub- 
images, such as the area around the eye or ear, etc. Here, 
we use the complete face and remove increasing portions 
of the bottom half of the image, so that initially the neck 
is obscured, followed by the chin and mouth, etc. The re- 
moval occurs by setting the pixels to be black. Thus, the 
complete image is used as a feature vector, and a subset 
of the elements is set to zero. A second experiment is per- 
formed following the same procedure, but to the central 
section of the face, thus initially obscuring the nose, then 
the ears, eyes and mouth, etc. 



In this experiment, we use the CMU-PIE dataset [37 1 
which contains 68 subjects and a total of 41368 face im- 
ages. Each person has their picture taken with 13 different 
poses, 43 different illumination conditions, and with 4 dif- 
ferent expressions. In our experiment all of the face images 
are aligned and cropped, with 256 gray level per pixel J4Q] , 
and finally resized to 24 x 24 pixels. Here we use the subset 
containing images of pose C27 (a nearly front pose) and 
we use the data from the first 20 subjects, each subject 
with 21 images. The first 15 images of each subject are 
used for training and the last 6 images for testing. The 
test images are preprocessed so that one part (bottom or 
middle) of faces (from 10% to 40% of pixels) are set to 
black. See Figure [5] for examples. 

The recognition rates of different methods when the bot- 
tom area of the image is occluded are reported in Table [5] 
When occlusion area is small, all methods except MM- 
estimator obtain perfect 100% recognition rates. When 
occlusion area increases to 20% of the image size, accuracy 
for LRC drops to 80%, which is because the black pixels 
bias the linear regression estimate. The technique used 
in SRC mentioned above performs better than LRC when 
occlusion are present, achieving 98.3% accuracy. However 



our method is able to achieve 100% accuracy with that 
level of occlusion. We can see that CRC achieves a rela- 
tively good result (83.3%) for 30% occlusion, accuracies of 
other methods (including robust methods MM-estimator 
and LMS) drop dramatically. In contrast, our method still 
achieves 100% accuracy which demonstrates the robust- 
ness of our method against heavy occlusion. The compar- 
ison of these methods for occlusion in the middle part of 
faces is shown in FigurelHl These results again show the ro- 
bustness of our method against heavy occlusions. Almost 
all the methods show lower accuracy than in the former 
situation. Such a results leads to the conclusion that infor- 
mation from the middle part of a face (area around nose) is 
more discriminative than that form the bottom part (area 
around chin) for face recognition. 



Method 



Percentage of image removed 

10% 20% 30% 



LRC 
SRC 
CRC 



100% 80.0% 
100% 98.3% 
100% 96.7% 



61.7% 
71.7% 
83.3% 



MM-cstimator 

LMS 

our method 



99.2% 41.7% 
100% 77.5% 
100% 100% 



15.0% 
58.3% 
100% 



Table 5: Recognition accuracies of various methods on the CMU-PIE 
dataset with dimension 24 X 24. 10% to 30% of pixels in the bottom 
area are replaced with black. 



Method 



Percentage of image removed 

10% 20% 30% 



LRC 
SRC 
CRC 



100% 
100% 

98.3% 



91.7% 
93.3% 

87.5% 



35.0% 
65.0% 
53.3% 



MM-estimator 

LMS 

our method 



92.5% 7.5% 0.8% 

100% 80.8% 22.5% 

100% 99.2% 90.0% 



Table 6: Recognition accuracies of various methods on the CMU-PIE 
dataset with dimension 24 X 24. 10% to 30% of pixels in the middle 
area are replaced with black. 



6.2.4- Efficiency 

For the problem of identifying outliers in face images, 
we compare the computation efficiency using the AR face 
dataset, as described in Section [6.2.1l We vary the feature 



Method 



54 



Feature dimension 

300 1200 4800 



19800 



original 
fast 



2.051 
0.113 



9.894 
0.566 



48.371 
2.564 



396.323 

18.811 



5150.137 
361.689 



Table 7: Computation time (in seconds) of the original and fast 
algorithms when applied to the AR face dataset. 



dimension from 54 to the original 19800. Table [7] shows 
the execution time for both the proposed fast algorithm 
and the original method. We can see that the fast al- 
gorithm outperforms the original in all situations. With 
low dimensional features, below 4800, the fast algorithm 
is approximately 20 times faster than the original. When 
the feature dimension increases to 19800, the original al- 
gorithm needs about 1.43 hours while the fast algorithm 
costs only about 6 minutes. 

6.3. Robust iris recognition 

Iris recognition is a commonly used non-contact bio- 
metric measure used to automatically identify a person. 
Occlusions can also occur in iris data acquisition, espe- 
cially in unconstrained conditions, caused by eyelids, eye- 
lashes, segmentation errors, etc. In this section we test 
our method against segmentation errors, which can result 
in outliers from eyelids or eyelashes. Specially we take 
the ND-IRIS-0405 dataset [41], which contains 64,980 his 
images obtained from 356 subjects with a wide variety 
of distortions. In our experiment, each iris image is seg- 
mented by detecting the pupil and iris boundaries using 
the open-source package of Masek and Kovesi 42]. 80 
subjects were selected and 10 images from each subject 
were chosen for training and 2 images for testing. To test 
outlier detection, segmentation errors and artificial occlu- 
sions were placed on the iris area, in a similar fashion as 
[30]. A few example images and their detected iris and 
pupil boundaries are shown in Figure El The feature vec- 
tor is obtained by warping the circular iris region into a 
rectangular block by sampling with a radial resolution 20 
and angular resolution of 240 respectively. These blocks 
were then are then resized to 10 x 60. For our method, 
10% of pixels are detected and removed when test images 
are with only segmentation errors, and the corresponding 
additional number of pixels are removed for artificial oc- 
clusions. 

The re cog nition results are summarized in Table[5] SRC 
used in [30[ for iris recognition and LRC are compared 
with our method. We can clearly see that the proposed 
method achieved the best results with all feature dimen- 
sions. Specifically, our method achieves 96.3% accuracy 
when iris images are with only segmentation errors while 
accuracy for LRC is 89.5%. SRC performs weh (95.6%) 
for this task. However when 10% additional occlusions oc- 
cur in the test images, performances for LRC and SRC 
drop dramatically to 43.8% and 61.3% respectively, while 
our method still achieves the same result 96.3% as before. 
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Figure 6: Three example images from the ND-IRIS-0405 dataset. 
Features are extracted from the iris area which is between the de- 
tected iris and pupil boundaries as shown in red and blue circles 
respectively. The first two iris images suffer from increasing segmen- 
tation errors while the third one suffers from both segmentation error 
and artificial occlusion. 



Method 

LRC 



Percentage of artificial occlusion 

0% 10% 20% 



89.5% 43.8 



20.1 



SRC 



95.6% 61.3% 



51.3% 



our method 96.3% 96.3% 



95% 



Table 8: Classification accuracies on the ND iris dataset. Note that 
percentage of artificial occlusion 0% means there are only segmenta- 
tion errors. 



When occlusions increase to 20%, our method still obtains 
a high accuracy 95% which is higher than those of LRC 
and SRC by 74.4% and 43.7% respectively. 

6.3.1. Efficiency 

Table [S] shows the computation time comparison of dif- 
ferent methods on the iris recognition problem. Consistent 
with the former results, the proposed algorithm is much 
more efficient than the original algorithm. 



Method 

original 
fast 



Image resolution 

5 X 20 5 X 60 10 X 120 20 x 240 



2.400 
0.131 



6.692 
0.385 



31.942 
1.941 



255.182 
14.906 



Table 9: Computation time (in seconds) comparison of the origi- 
nal and fast algorithms on the Iris data set with different feature 
resolutions (shown in the first row). 



7. Discussion 

The main drawback of our method is that one have to 
first estimate the outlier percentage empirically as done 
by many other robust regression methods. Actually, to 
our knowledge, for almost all the outlier removal methods, 
one has to pre-set the outlier percentage or some other 
parameters such as a residual threshold. This is in contrast 
with those robust regression methods using a robust loss 
such as the Huber function or even nonconvex loss. These 
methods do not need to specify the outlier percentage. 

One may concern how the proposed algorithm will per- 
form with an under or over estimated p. Taking the AR 
dataset for example, we evaluate our method by varying 
p from 25% to 45%. From Table ITOl we can see that the 
proposed method is not very sensitive to the pre-estimated 
outlier percentage when p is over 30%. We also observe 



that our method becomes more stable when the image res- 
olution is higher. This is mainly because, as mentioned 
before, visual recognition problems generally supply large 
amount of pixels by high dimensional images and conse- 
quently it is more crucial to reject as many outliers as 
possible than to keep all inliers. 

Different from our approach, there exist many robust es- 
timators which do not need to specify the outliers number, 
such as MM-estimator [H], LMS Q and DPM [l^. These 
methods can also be applied to visual recognition problems 
as we have shown in Sectional However, the difference is 
that our method can directly identify the outliers, which 
can help compute more reliable residuals for classification 
as shown in Section[S] Of course, for these methods, obser- 
vations can be detected as outliers when the corresponding 
standardized residuals exceed the cutoff point, which also 
has to be determined a priori though. 



Dimension 



Percentage of removed pixels 

25% 30% 35% 40% 45% 



13 X 10 82% 85% 92.5% 95% 94.5% 

20 X 15 98% 99.5% 100% 100% 99.5%% 

Table 10: Recognition accuracies on the AR dataset with different 
percentage of outliers removed by our method. The feature dimen- 
sion is set to 13 X 10 and 13 X 10. 



8. Conclusion 

In this work, we have proposed an efficient method for 
minimizing the Loo norm based robust least squares fit- 
ting, and hence for iteratively removing outliers. The ef- 
ficiency of the method allows it to be applied to visual 
recognition problems which would normally be too large 
for such an approach. The method takes advantage of the 
nature of the Loo norm to break the main problem into 
more manageable sub-problems, which can then be solved 
via standard, efficient, techniques. 

The efficiency of the technique and the benefits that out- 
lier removal can bring to visual recognition problems were 
highlighted in the experiments, with the computational ef- 
ficiency and accuracy of the resultant recognition process 
easily beating all other tested methods. 

Like many other robust fitting methods, the proposed 
method needs a parameter: the number of outliers to be 
removed. One may heuristically determine this value. Al- 
though it is not very sensitive for the visual recognition 
problems, in the future, we plan to investigate how to au- 
tomatically estimate the outlier rate in the noisy data. 
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